Semantic Wordification of Document Collections
نویسندگان
چکیده
Word clouds have become one of the most widely accepted visual resources for document analysis and visualization, motivating the development of several methods for building layouts of keywords extracted from textual data. Existing methods are effective to demonstrate content, but are not capable of preserving semantic relationships among keywords while still linking the word cloud to the underlying document groups that generated them. Such representation is highly desirable for exploratory analysis of document collections. In this paper we present a novel approach to build document clouds, named ProjCloud that aim at solving both semantical layouts and linking with document sets. ProjCloud generates a semantically consistent layout from a set of documents. Through a multidimensional projection, it is possible to visualize the neighborhood relationship between highly related documents and their corresponding word clouds simultaneously. Additionally, we propose a new algorithm for building word clouds inside polygons, which employs spectral sorting to maintain the semantic relationship among words. The effectiveness and flexibility of our methodology is confirmed when comparisons are made to existing methods. The technique automatically constructs projection based layouts the user may choose to examine in the form of the point clouds or corresponding word clouds, allowing a high degree of control over the exploratory process.
منابع مشابه
A Joint Semantic Vector Representation Model for Text Clustering and Classification
Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...
متن کاملSearch and Navigation in Semantically Integrated Document Collections
The paper presents a novel approach to semantic search and navigation in office-like document collections. The approach is based on a semantic document model that we have developed to enable unique identification, semantic annotation, and semantic linking of document units of officelike documents. In order to semantically annotate document units and to link semantically related document units, ...
متن کاملA Wordification Approach to Relational Data Mining: Early Results
This paper describes a propositionalization technique called wordification. Wordification is inspired by text mining and can be seen as a transformation of a relational database into a corpus of documents. As in previous propositionalization methods, after the wordification step any propositional data mining algorithm can be applied. The most notable advantage of the presented technique is grea...
متن کاملSemantically-Based Active Document Collection Templates for Web Information Management Systems
Representing and processing semantic information regarding individual documents is desirable but not sufficient. To improve the efficiency and reusability of users’ work with Web-based information management systems, it is essential to handle document collections. We describe techniques for representing semantics both of collections and of information management services that operate upon them....
متن کاملIn-Young Ko, Robert Neches, and Ke-Thia Yao: A Semantic Model and Composition Mechanism for Active Document Collection Templates in Web-based Information Management Systems
Representing semantic information embedded within documents is important, but not sufficient, for the Semantic Web vision of having machines automatically process data found on the Web. Many Web-based information management service tools, such as GeoWorlds [5,13], deal with collections of documents and the services that operate upon them. Semantic modeling of document collections and Web servic...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Comput. Graph. Forum
دوره 31 شماره
صفحات -
تاریخ انتشار 2012